Constructing a PCA model

PCA models are used for two main purposes: 1) to reduce the number of dimensions of a dataset for process analysis and 2) to find the causes of error more efficiently.

It is often easier to describe a process in terms of only a few variables instead of all of the interacting, complex variables that make up most processes. A PCA model does this by reducing the number of dimensions in the process - it creates new variables to describe most of the activity in the process. These new variables are called Principal Components, and are referred to as t0, t1, t2 etc with t0 explaining most of the variation in the process. The variable that contributes the greatest effect on the principal component will then be the variable that needs investigating in order to reduce the process variation.

Principal components are not coupled to other variables - there is no correlation between them as there is between process variables. Principal components also differ from process variables in that they do not have units of measure.

The PCA models process input variables only, therefore no model targets are selected when creating the model.

Good practice:

It is handy to have the data used for the PCA model ordered according to timestamps. Data is not necessarily ordered in the data source, and it is easier to view and understand the data when using the PCA Simulation View if it is in a time order.
Often the model is first created on the "golden batch" - a process that you know was very successful and had very little deviation. This will give a good visual indication of how the data should look when viewed as a PCA model. This can then be compared to the PCA model of other datasets, to see what changes need to be implemented in order to achieve results similar to that of the golden batch.

Note: In some instances, the PCA algorithm used in the software is not able to compute the eigenvectors which are needed to determine the principal components. This is a scenario that could arise from your data and might be a combination of not having enough samples in your data set and/or input variables that are highly correlated with one another or don’t have enough information. This can be addressed by adding more samples to your dataset and/or removing some variables that are contributing to this scenario.

Constructing a PCA model

On the Troubleshooting Project Bar, click on the modeling button. Note this button is only available after the previous steps have been completed.
From the modeling view click on [Construct] in the PCA group.
The configuring dialog for constructing the PCA model is shown in figure below.
Select the field(s) to add and click on the add (>) button to add the field to the selection.
Remove field(s) by selecting the a field and click on the remove (<) button.
To add all fields to the selection, click on the add all (>>) button.
To remove all fields from the selection, click on the remove all (<<) button.
Data Selection (data used for model construction):

- All data:
  
  To select all the data from the selected inputs.
- Inside History Brushing:
  
  Select only the data inside the region, marked yellow.
- Outside History Brushing:
  
  Select only the data outside the region, marked in red.

Normalise data:

Selecting this option will normalise all data provided to the model. Normalizing data is the conversion of all data points to a value of between 0 - 1, meaning that different variables can be compared without the influence of scale and units. You don't need to normalize the data if the data is all of the same variable, thus if you know that all the units and the scale is already the same.
Confidence percentage:

The confidence percentage will determine the percentage of confidence that is needed for the construction of the PCA Model.
Click Ok when done.

By clicking on [Modeling] on the Troubleshooting Project Bar, the attributes of the PCA model will be listed. You will see the number of principal components created to explain the requested confidence. The actual confidence will differ slightly from the normal confidence, explaining how much of the variance is actually explained by the number of principal components. For the record, the data selection and if the data was normalized is also listed.

Reconstruct PCA

This option allows the user to reconstruct the PCA model with different criteria for inputs fields, brushing area, confidence percentage and Normalise data.

Click on the Reconstruct PCA button
Select the fields to use, data selection (All, inside or outside history brushing), normalise data and confidence percentage.
Click on Ok when done.